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Abstract 

We propose a new method to test the effectiveness of a spatial 
point process forecast based on a log-likelihood score for predicted 
point density and the information gain for events that actually oc- 
curred in the test period. The method largely avoids simulation use 
and allows us to calculate the information score for each event or set 
of events as well as the standard error of each forecast. As the num- 
ber of predicted events increases, the score distribution approaches the 
Gaussian law. The degree of its similarity to the Gaussian distribution 
can be measured by the computed coefficients of skewness and kurto- 
sis. To display the forecasted point density and the point events, we 
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use an event concentration diagram or a variant of the Error Diagram 
(ED). 

We demonstrate the application of the method by using our long- 
term forecast of seismicity in two western Pacific regions. We com- 
pare the ED for these regions with simplified diagrams based on two- 
segment approximations. Since the earthquakes in these regions are 
concentrated in narrow subduction belts, using the forecast density as 
a template or baseline for the ED is a more convenient display tech- 
nique. We also show, using simulated event occurrence, that some 
proposed criteria for measuring forecast effectiveness at EDs would be 
strongly biased for a small event number. 

Key words: Probabilistic forecasting; Spatial analysis; Fractals 
and multifractals; Probability distributions; Earthquake interaction, 
forecasting, and prediction; Statistical seismology. 
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1 Introduction 



This paper continues our analysis of stochastic point process forecast ver- 
ification (Kagan 2007b). There (ibid.) we had discussed two interrelated 
methods for measuring the effectiveness of earthquake prediction algorithms: 
the information score based on the likelihood ratio (Kagan 1991) and the 
"Error Diagram" (ED). These methods have been applied (Kagan 2007b) to 
temporal renewal stochastic processes, but only for very long processes with 
the number of events approaching infinity. 
In this work we extend our analysis by 

1) discussing spatial (not temporal) random processes (fields); 

2) considering forecast testing if the number of events is relatively small; 

3) applying newly developed techniques to long-term earthquake forecasts. 
Two issues are related to the problem of testing point process forecasts: 

• 1) Spatial random point fields density evaluation and its prediction is a 
mature discipline with many publications. Baddeley et al. (2005), Baddeley 
(2007), Daley and Vere- Jones (2003, 2008) provide reviews. As we explain 
below, the earthquake forecasting problem is different in many respects from 
regular density evaluation and requires special treatment. However, some 
results of this paper can be applied to test the forecast of a random spatial 
pattern. 

• 2) Well-developed application methods exist in weather and climate pre- 
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diction; their reviews have been recently published by Jolliffe & Stephenson 
(2003), Palmer & Hagedorn (2006), DelSole k Tippett (2007). These pre- 
diction methods consider continuous processes and fields; however, with nec- 
essary modifications some of these methods can be used for stochastic point 
processes. 

Our main focus here is on the two most widely used approaches to as- 
sessing earthquake prediction methods (Zaliapin & Molchan 2004). Both 
approaches evaluate how a prediction method reveals new information about 
impending earthquake activity. The first approach starts by estimating the 
expected spatio-temporal distribution of seismicity and uses the classical 
likelihood paradigm to evaluate predictive power. Accordingly, it uses the 
nomenclature of statistical estimation. The second one applies the results by 
Molchan (1990, 1997; see also Molchan & Keilis-Borok 2008) who proposed 
error diagrams for measuring prediction efficiency The EDs plot the nor- 
malized rate of failures-to-predict {y) versus the normalized time of alarms 
(r). The ED can be considered as a time-dependent analog of the Neyman- 
Pearson lemma on making a decision: should we expect an earthquake within 
a given spatio-temporal region? Consequently, it uses the language of hy- 
pothesis testing. 

The error diagram is related to the Relative Operating Characteristic 
(ROC) (Swets 1973; Mason 2003, pp. 66-76), used in signal detection and 
weather prediction efforts. In the ROC diagrams the success rate of an event 
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prediction is compared against the false alarm rate. 

Starting with Molchan's (1990) paper, previous EDs were almost exclu- 
sively time-dependent. We apply the ED to time- independent spatial earth- 
quake distributions. In some respects, the earthquake spatial pattern is more 
difficult to analyze than the temporal distribution. In the latter case, we have 
a reasonable null model (the uniform in time Poisson process) which can be 
compared to any test model. In the spatial case, the simple model of uni- 
formly distributed seismicity can hardly serve as an initial approximation; 
even large earthquakes (which often can be well-approximated by a Poisson 
temporal process) are strongly clustered in space. This property of seismic- 
ity is caused by the fractal nature of earthquake spatial distribution (Kagan 
2007a). Although in our forecast (Kagan & Jackson 2000) we use a pro- 
jection of earthquake centroids on the Earth surface which smoothes their 
spatial distribution (Kagan 2007a), the spatial distribution still preserves a 
self-similar fractal pattern with large parts of the Earth practically aseismic. 

Diagrams similar to EDs have been used previously to describe spatial 
distribution of seismicity: Rong & Jackson (2002, their Fig. 3); Kagan et ai, 
(2003, their Fig. 5.3); Helmstetter et ai, (2007, their Fig. 4); and Shen et 
ai, (2007, their Figs. IB, 2B) created spatial "Concentration Diagrams" to 
characterize the agreement (or lack thereof) between the predicted seismicity 
distribution and future earthquakes. These diagrams plot the fraction of 
the event success rate (equivalent to 1 — v) versus the normalized area (t), 
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sorted by probability density. The sorting is largely analogous to water-level 
threshold analysis (Zechar & Jordan 2008). These concentration diagrams 
can easily be converted to EDs by adding an ascending diagonal and then 
reflecting the plot in the line ordinate (v — 1/2). 

In principle, such diagrams can be treated like ROC plots where cells with 
events are considered as success and empty cells as false alarms. However, 
this interpretation encounters difficulties when the cells are not infinitesimally 
small, so some may contain more than one event. Moreover, as we show 
below, for a point process on a sphere it is difficult to define cells of equal 
size. Usual sphere subdivision yields unequal cells larger at the equator and 
smaller towards the poles. 

Characterizing prediction performance is a major challenge for ED anal- 
ysis. Since prediction results are represented by a function (curve), it is 
important to find a simple one-parameter criterion (a functional) that briefly 
expresses the efficiency value. Several functionals have been proposed, each 
with some advantages or disadvantages. For example, Kagan & Jackson 
(2006, their Section 5) show that two ED trajectories with a very different 
behavior have the same "Sum of Errors" [y + r) value, which often is pro- 
posed as a measure of ED forecast efficiency (Kossobokov 2006; Molchan & 
Keilis-Borok 2008). 
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2 Long-term earthquake forecasts 

Kagan & Jackson (1994, 2000) present long-term and short-term earthquake 
forecasts in several regions using the CMT catalog (Ekstrom et al. 2005; 
http://www.globalcmt.org/). The forecasted earthquake rate is calculated 
as a spatially smoothed earthquake distribution. The spatial kernel used in 
the smoothing has units of earthquakes per unit area and time. In our studies 
it applies to all shallow (depth less or equal 70 km) earthquakes with moment 
M = 10 17 ' 7 Nm (magnitude 5.8) and greater. The kernel is elongated along 
the fault-plane, which is estimated from available focal mechanism solutions. 

To take into account the impact of earthquakes outside the forecasted 
regions boundaries, we consider events up to 1000 km outside the region 
window. The rate density from those earthquakes is added to the forecast 
density of 'inside' events applying the kernel estimates. This additional prob- 
ability density from outside events is on average balanced by a contribution 
'leakage' from many 'insider' earthquakes close to the boundaries. 

An important feature of Kagan & Jackson's (1994) method is a jack- 
knife like procedure (Silverman 1986) for testing the predictive power of the 
smoothing. It optimizes the kernel parameters choosing those values which 
best predict the second half of a catalogue, using a maximum likelihood 
criterion, from the first half. We argue that because the seismicity pattern 
exhibits a long-term clustering (Kagan & Jackson 1991), such a procedure is 
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better suited to predict the future earthquake rate. We also assume on an 
ad hoc basis (Kagan & Jackson 1994, 2000) that the background probability 
density is uniform over the whole region and integrates to 1% of the total 
earthquake rate 

(Background rate) = e x (Total rate) , (1) 

with e = 0.01. 

Kagan (2007a) shows that the fractal dimension of earthquake hypocen- 
ters, S, strongly depends on the earthquake catalog time interval. For tem- 
porally short catalogs 5 is close to zero and approaches the asymptotic value 
5 « 2.3 for catalogs of decades length. In a more intuitive setting, this result 
signifies that in short time intervals, hypocenters are concentrated in a few 
point clouds. With increased time, seismicity spreads over a fault system or 
seismic belt length, eventually occupying a set with the dimension in excess 
of the 2-D plane. Therefore, if one uses a set of earthquake epicenters in 
a relatively short catalog to predict the future seismicity rate, the optimal 
forecast kernel should spread beyond the presently available event points, 
i.e., to be smoother than the standard density estimators (Silverman 1986) 
would suggest. 

The forecasts are expressed as the rate density (that is, the probability 
per unit area and time). They are updated every day and posted for two 
western Pacific regions at 
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http:/ /scec. ess. ucla.edu/~ykagan/predictions_index. html (see FORECAST 



TEST FOR 2004-2006:). Table ffl displays a small extract of the forecast 
tables available at the Web site. In this Table we sorted (ordered) entries by 
the values of earthquake forecast densities. 

In Fig. [T] we display the long-term forecast map computed for the north- 
west (NW) Pacific region using the CMT catalog for 1977-2003. Shallow 
earthquakes in 2004-2006 are shown in white. Similar maps for NW and 
southwest (SW) Pacific are shown, for instance, in Figs. 8a,b by Kagan & 
Jackson (2000). On visual inspection, the model predicts the spatial dis- 
tribution of seismic activity reasonably well. We tested this forecast by a 
Monte-Carlo simulation (Kagan & Jackson 1994). 

In Fig. [2] we show the forecasted earthquake density with 10 sets of syn- 
thetic catalogs, each having 108 events. Earthquakes are assumed to occur 
at the centers of grid cells; therefore some of the grid points are occupied 
by more than one event. Some of the simulated points occur in areas of low 
seismicity (compare Fig. [2] with Fig. [I]). As mentioned above, this feature of 
the forecast is used to prevent surprises, i.e., an occurrence of earthquakes 
in zones where no nearby events happened in 1977-2003. 

Table [2] summarizes annual earthquake rates for both western Pacific 
regions. Because events outside the region's boundaries have influence, the 
rates calculated through the smoothing procedure and evaluated by a direct 
method (dividing the earthquake numbers by time interval) are close but do 



not coincide. 

3 Log-likelihood 

Kagan & Knopoff (1977, see also Vere- Jones 1998) suggested measuring the 
effectiveness of the earthquake prediction algorithm by first evaluating the 
likelihood ratio to test how well a model approximates an earthquake occur- 
rence. In particular, they estimated the information score, /, per one event 



a catalog, log 2 is used to obtain the score measured in the Shannon bits of 
information, pi is the probability of earthquake occurrence according to a 
stochastic model, conditioned by the past: 



where I(t) is the past history of the process up to the moment t, and £j is a 
similar probability for the event occurrence according to the Poisson process. 

The Poisson process rate can be calculated by normalizing the seismicity 
level in the forecast regions. Several rates, such as shown in Table [2j can be 
used in the normalization. To make our results comparable to the forecast 
rate density, we use V\ values 



by 




-likelihood ratio, n is the m 



Pi = Prob { an event in (t, t + A)\ I(t)} , 



(3) 
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where v% is the annual rate of earthquakes in each region in 1977-2003 (Ta- 
ble EJ, U and 9i are the upper and lower latitudes, respectively, <fi u and <pi 
ditto for longitudes. For the NW-Pacific region £j = 2.6289 x 10 -9 eq/ (day 
x km 2 ); for the SW-Pacific & = 3.3479 x 10~ 9 eq/(day x km 2 ). 

Several methods can be used in calculating the information score for a 
set of forecasted events. Using the forecasted rate values (Aj for cell centers 
in which earthquakes occurred) we compute 

1 n \ 

h = (5) 

where n is the number of events. 

In Eq. [5] and in derivations below, we assume that earthquakes in the cells 
are identically distributed independent (i.i.d.) events. The assumed indepen- 
dence may be challenged by the clustered nature of earthquake occurrence of 
which foreshock-mainshock-aftershock sequences are the most clear example 
(Kagan & Knopoff 1977; Kagan 1991). However, given the high magnitude 
(5.8) threshold for the CMT catalog, the clustering is less pronounced. The 
dependent events on average constitute only about 0.2 of the total seismic 
activity (Kagan & Jackson 2000, Eq. 23). Thus, we expect that earthquake 
statistical inter-dependence would have relatively small influence. 

As another option, instead of §5§ we compute the information score for 
the actual epicenter (centroid) locations (A&) 



In simulated catalogs we generate multiple (TV = 10, 000) sets of n 2 events 
(Table [2]) and calculate the rate for cell centers as the earthquake location 
(see Fig. ED 

1 ™ 2 X 



and 
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< J 3>= T?E( J 3)^ (8) 
■ /v €=1 

This method has an advantage in that we do not need to calculate the rate 
densities again, as for I 2 , but instead use the previously computed forecast 
tables (as shown in Table [1]) to evaluate the scores. 

In Fig. [3] we display the log-likelihood function distribution differences 
for the simulation as shown in Fig. [2j We simulate n 2 earthquake locations 
according to 1977-2003 forecasts for each region. Each time we calculate 
the log-likelihood function and subtract the log-likelihood function value ob- 
tained for the real catalogue in 2004-2006. Thus, we display the histogram 
of h - I 2 (Eqs.HandED. 

4 Error diagrams 

To test the long-term forecast efficiency numerically, we calculate the con- 
centration diagram. To make these diagrams, we divide the region into small 
cells (0.5 by 0.5 degrees) and estimate the theoretical forecast rate of earth- 
quakes above the magnitude threshold for each cell. We then count the events 
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that actually occurred in each cell, sort the cells in the decreasing order of 
the theoretical rate, and compute the cumulative values of forecast and the 
observed earthquake rates (see Table [[]). Similar plots have been used in 
several of our papers (Kagan et al. 2003; Helmstetter et al. 2007; Shen et al. 
2007). In effect, these diagrams are equivalent to the error diagrams (EDs) 
proposed by Molchan (1990, 2003) and Molchan & Kagan (1992). But in 
this case we use the normalized spatial area, not time, as the horizontal axis. 

4.1 Relation between the error diagram and informa- 
tion score 

We illustrate the ED by a sketch in Fig. HI For the spatial point distribution, 
this example is easier to construct and explain than for temporal renewal 
processes (Kagan 2007b). The square's diagonal corresponds to the uniform 
Poisson distributions of the points in a region, i.e., a random guess forecast 
strategy. As a test example, we assume that the region consists of three 
sub-areas, their surfaces is 0.1, 0.5, and 0.4 of the total, and the number 
of events z/j is 0.4, 0.5, and 0.1, in each zone respectively. The points in 
these zones are distributed according to the Poisson spatial process with the 
density Vijn. Then, the information score for such a point distribution can 
be calculated as (see Eq. [2]) 

I = ut log 2 - = 0.4 log 2 4.0 + 0.5 log 2 1.0 + 0.1 log 2 0.25 

i=l r » 

= 0.8-0.2 = 0.6. (9) 
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For the normalized point Poisson distribution in the ED, the point density 
is unity. Hence its contribution to the information rate ([9]) is zero. 

The information score can be calculated for continuous concave curves in 
an error diagram (Kagan 2007b) 

I = (10) 

If the ED consists of several linear segments (as in Fig. HJ), then (fTOl) converts 
to 

h = f>log 2 p) , (11) 

i=l v r » ' 

where % are cell numbers, N is the total number of grid points, and v% and r% 
are the normalized rates of occurrence and cell area: 

Ri , Si 



N D aIld T i = a > ( 12 ) 



Y^ JV P Y^ JV Q 

see Table [TJ When such calculations are made for a spherical surface (as 
in Figs. [Ul2]), the r< steps are usually unequal in size, unless a special effort 
is made to partition a sphere into equal-area cells (see more in Kagan & 
Jackson 1998). This cell inequality complicates the calculation. 

Figs. [5] and [6] show the EDs for both Pacific regions. The red curves are 
for the forecast, based on 1977-2003 seismicity, and the blue curves are for 
the earthquakes which occurred in these regions from 2004-2006. Both sets 
of curves are calculated using the forecast tables like those in the example 
(Tabled]). In principle, the calculations such as in ( fl~2]) can be made with 
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unordered cells. The density ordering in Table [T] and Figs. EJ M is performed 
to create the ED diagrams. 

The score values Iq in Table [3] are calculated using the distribution shown 
by the red curves in Figs. El The J values for NW- and SW-Pacific 
indicate that the forecast yields an information score higher than 2-3 bits 
per event. This means that on average the probability gain (G) is a factor 
of 5 to 10 (2 2,36 to 2 3,38 ) when using the long-term forecast compared to a 
random guess. Of course, these Jo values do not fully describe the forecast 
advantage. The boundaries of both regions have already been selected to 
contain the maximum number of earthquakes in relatively small areas. If we 
extend any of the regions toward the seismically quiet areas, the information 
score would significantly increase. The proper measure of long-term forecast 
effectiveness would extend the forecast method globally, i.e., over the whole 
Earth surface. Limited numerical experiments suggest that depending on 
degree of smoothing, the value of e (Eq. [[]), and other factors, the G- value 
for world-wide seismicity varies from about 10 to 25. 

The above values of the probability gain, G, can be compared with similar 
calculations by Rhoades & Evison (2005, 2006), Console et al. (2006), and 
Rhoades (2007). These authors calculated the information rate per earth- 
quake for a model of smoothed seismicity (PPE), similar to our long-term 
model. The PPE model was compared to a stationary and spatially uniform 
Poisson (SUP) model. The probability gain, computed using the information 
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rate, for New Zealand, Japan, Greece, and California is about 4.5, 1.6, 1.6, 
and 3.4, respectively. These relatively small gain values are related with the 
choice by the authors of the regions that include only seismically active areas 
(see ibid.). Helmstetter et al. (2007, Table 1) obtained for different long-term 
seismicity predictive models in California the G- values ranging from 1.2 to 
4.8. 

The ED curves for earthquakes in Figs. [5l [6] are similar to the forecast 
earthquake curves. The computation of the likelihood scores (0 M> shows 
that the NW earthquakes have a better score than the forecast, whereas SW 
events display the opposite behavior (see also Fig. [3]). The scores using the 
actual centroid position (^2) are larger than those for the cell centers {h), 
an anticipated feature. Similarly, Table [3] shows that the average scores for 
synthetics (< ^3 >) are very close to those of Iq, which is understandable, 
since the simulation runs are extensive (see Eqs. [3, [8]). 

Fig. [7] shows the frequency curves for the log-likelihood function of both 
western Pacific regions. We display log 2 of the normalized rate (see column 5 
of Tabled]) against the normalized cumulative area of the cells (column 4). 
Curves for both regions exhibit high values of the rate (Ri) concentrated in 
a relatively small fraction of area. Low values at the right-hand end of the 
diagram correspond to the assumed uniform background probability density 
(Section [21 see also Figs. EDT2]). 

We calculate the higher order moments for the error curve (Iq of Eq. [TTJ 
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corresponds to the first moment /ii) 

/ Us \ 1 & 

(13) 



i=l 



!og2 ( — ) - J o 



r,- 



where fc = 2, 3, 4, .... 



The standard deviation of the log-likelihood for the set of n events is 



On = VW« 2 , (14) 

where n,2 is the earthquake number during 2004-2006 (see Table [2]). The 
coefficient of skewness is 

V = /^s//4 /2 , (15) 

and coefficient of kurtosis is 

ip = // 4 //^ - 3. (16) 

These coefficients characterize how the likelihood curve differs from the Gaus- 
sian distribution; for the latter law both coefficients should be zero. The 
Central Limit theorem states that for large numbers of i.i.d. events their dis- 
tribution should approach the Gaussian law. If the event number is small, 
we need to find an efficient way to numerically approximate the distribution 
of the sum of i.i.d. random variables. 

In Table [3] both coefficients are large for one event likelihood curve (see 
also Fig. [7]), but for the set of n events they are small: the distribution is 
close to the Gaussian law as demonstrated in Fig. [3j The difference between 
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the score values J to J 2 is less than the standard error value (see Table [3]). 
Thus both forecasts can be considered statistically successful. 
The difference 

I' = I -h or I" = I -I 2 , (17) 

shows the predictive efficiency of a forecast, i.e., whether on average earth- 
quakes in 2004-2006 occurred at the sites listed in the prediction table (see 
an example in Table [T]). For this particular time interval, both forecasts 
are sufficiently good. However, as other examples (Kagan & Jackson 2000, 
Fig. 9; Kagan et al. 2003, Fig. 5.2) demonstrate, this is not always the case. 
The values of differences (negative for the NW-Pacific and positive for the 
SW-Pacific) correspond to those simulations in Fig. [31 where we display the 
distribution of the difference I 3 — I 2 . 

By applying ffTTj) to the blue curve of earthquakes in 2004-2006 in Figs. [6] 
we evaluate the information score 
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— P i lo §2 



(18) 



(see Table |3]). The value of 1^ is obviously significantly larger than all the 
other estimates of the score. Earthquake simulations provide an explanation 
for this feature (see Fig. [TOl below) . 
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4.2 Two-segment error diagrams and information score 

Similarly to Fig. 5 in Kagan (2007b), in Fig. [S] we display the approximation 
of the ED for the NW-Pacific by several two line segment diagrams with the 
same value of the information score, Iq- 

For the assumed information score /, the contact point of two segments 
is defined by the equation (corrected Eq. 22 by Kagan 2007b) 



D 1 



2 1 . (19) 



v-l-D 1 

By solving this equation for any value of the first segment slope D\ (non- 
positive by definition), one obtains the v- value for the contact point of two 
linear segments, r = {y — 1)/D 1 . 

The first of these curves has the second segment coinciding with the 
abscissa axis. This means that one can obtain the same information score by 
concentrating all the points in the 2~ /() = 0.194 part of the region. However, 
though the /-value for such a pattern would be 2.36 bits, all points would have 
the same value of the probability gain. Hence, for such a likelihood value 
distribution, the variance and higher-order moments would be zero: very 
different from the actual probability gain pattern (Tabled]). If we modify 
the two-segment model to distribute the events with different densities over 
the whole area, the variance and the other moments would be non-zero. 

In Fig. we show the dependence of the lower-order moments for the 
likelihood score on the Di slope. For D\ = —2 x 2 1 (dashed magenta line, fifth 
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curve from the bottom) the 2nd, 3rd, and 4th moments correspond roughly 
to the moments of the forecasted densities. Thus, such a two-segment model 
would reasonably well approximate the actual event distribution. 

The contact coordinates of two segments for this curve are: u 5 = 0.1732 
and T5 = 0.0803. Therefore, the point pattern having apparently the same 
lower-order moments as the actual earthquake forecast would have about 
83% of points concentrated in 8% of the area, i.e., the point density will 
be 10.3 times higher than the uniform Poisson rate. The rest of the events 
would be distributed in 92% of the area and have the rate of 0.19 compared to 
the uniform Poisson distribution. As we mention in Section [2j in our Pacific 
forecast 0.01 part of the total earthquake rate is spread over the entire region 
(see Eq. [Hand Figs. [HE]). 

4.3 Information score for 1977-2003 CMT and PDE 
catalogs 

ED displays in Figs. [5j [6] are inconvenient since the most interesting parts of 
the curves are concentrated near v— and r-axes. The reason for this feature 
is that seismicity is concentrated in relatively narrow seismic belts having a 
fractal spatial earthquake distribution. Now we focus on how other curves 
deviate from the forecasted (red) one. To make these deviations show more 
prominently, we need to display the curves in a relative abscissa format, using 
the 1977-2003 forecast density as a template or baseline for the likelihood 
score calculation. 
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Fig. [TU] shows several curves in a new format; in effect we convert the 
red curve in Fig. [5] to the diagonal. This is equivalent to calculating the 
information scores by using Aj reference density 

I w =-$>log 2 f, ( 2 °) 

where Q is a rate density for all the other point distributions. Fig. [10] shows 
the difference between the forecast curve (red) and the earthquake curve 
(blue) better than Fig. [5] 

Fig. [10] also displays the curve for the 1977-2003 CMT catalog. The num- 
bers of events in the cell areas are shown in Table [I] column 8. Also shown 
is the curve for the PDE catalog (U.S. Geological Survey 2008) for 1968- 
2006. We obtain I x = 3.5991 bits/event for the 1977-2003 CMT catalog 
and Ji = 2.9789 bits for the PDE. These values are significantly larger 
than those forecasted for 2004-2006. Therefore, our forecast predicts better 
locations of past earthquakes than those of future events. Why this paradox? 
In our forecast we use a broader smoothing kernel to capture the spread of 
seismicity with time (Section [2]). Had we used the standard density esti- 
mation methods (Silverman 1986), the optimal kernel width would likely be 
smaller, but such a smoothing would not effectively forecast future seismicity. 
A similar explanation is apparently valid for the PDE score value. Helmstet- 
ter et al. (2007, Table 1) obtained G = 7.1 (significantly higher than the 
G-values for predictive algorithms) when the same data were used to build 
the long-term seismicity model and to test it (see Section fl~TT) . 
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In Fig. [TD] we also show several curves for the simulated earthquakes. 
These curves explain why the /4-value (1181) is significantly larger than the 
other measures of the information score. The reason is twofold. First, the 
number of events in the 3-year interval is relatively small and the curves 
often fluctuate around the expected value (the red curve). These fluctuations 
increase the sum value in (1181) . The curves are often below the red forecast 
line, which would usually cause the score value to increase. Second, the ED 
curve should be concave (Molchan 1997; 2003). /^values, listed in Tabled 
are calculated with the original curves shown in Figs. [6] which have many 
convex sections. If we make a lower envelope of the curve points, this would 
decrease the /4-value. However, our numerical experiments show that the 
decrease is not significant enough to bring the value sufficiently close to the 
Iq score. 

The fluctuations of the synthetic curves also suggest that some strategies 
proposed to measure the effectiveness of a prediction algorithm by considering 
the ED, like a sum of errors (v+t) or minimax errors (Molchan 1991; Molchan 
& Kagan 1992; Kossobokov 2006; Molchan & Keilis-Borok 2008) are biased 
for a small number of forecasted events. For western Pacific regions the 
number of predicted events (n 2 ) is relatively large; in many other applications 
of the ED (ibid.) this number is less than 10. 

In Fig. [TU] the forecast distribution curve is used as the template. Thus, 
we can measure the difference between this line and the other curves using 
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many standard statistical techniques, like the Kolmogorov-Smirnov test, the 
Cramer- von Mises, etc., (Stephens 1974) to infer whether these distributions 
are statistically different. 

5 Discussion 

Several information scores are displayed in Table [3j Although these scores 
appear different, the difference is caused either by the small event number or 
a small number of simulations. The following limits can be easily conjectured 

I = lim < h > , (21) 

(see Eq. ED • In Table [3] the difference between these two scores is small due 
to the large number of simulations. Similarly, 



[cf. Eq.QI]). Also 



I = lim J , or I = lim I„ , (22) 

|SiH0 iV->oo v ' 



h = lim I 2 , (23) 

|Si|-0 



(see Eqs. ElED- 

In addition, if the model of the long-term forecast is correct, then 



7i = lim /, and h = lim J , (24) 

ri2— >oo U2— »oo 



(see Eqs. U M 
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In this paper we wanted to extend statistical analysis of the stochastic 
point processes on line (usually time) to multidimensional space. In par- 
ticular, we wished to find the relation between two widely used statistical 
measures of prediction efficiency: likelihood scores and error diagrams. The 
equations derived here can be easily transformed to describe quantitative 
connection between the information scores and concentration diagrams (Sec- 
tion HJ. 

Summarizing our results, we list the following major points: 

• 1. As with temporal stochastic processes (Kagan 2007b), we find forward 
and inverse relations between the information score and the error diagram 
curve for point spatial fields. The error diagram represents a much more 
complete picture of the stochastic point process than does likelihood analy- 
sis. 

• 2. Since we are using a Poisson process to represent the long-term spa- 
tial point pattern, the resulting models are easier to visualize and calculate. 
However, the assumption of earthquake statistical independence and its in- 
fluence on the information score value both need to be investigated. 

• 3. We extend our analysis for relatively small samples of events and show 
that for such samples we should modify some of the testing criteria proposed 
for error diagrams. 

• 4. We show that the forecasting blueprint for estimating future earthquake 
point density differs from standard methods of statistical density evaluation. 
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Nevertheless, the connection between the likelihood score and error diagrams 

described above can be used in many density estimation problems. 

• 5. We show that for testing the long-term forecast, it is sufficient to process 

the forecast table to obtain the error diagram and most information scores. 

Thus, the simulation which was used in previous work, and which requires 

significant computational resources, can be avoided in most cases (Rhoades 

2008). 
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Table 1: Beginning and end of earthquake rate forecast table for NW-Pacific 
based on CMT catalog (Ekstrom et al. 2005) for 1977-2003 and ordered by 
descending rate density (column 3, A). Cells are 0.5° x 0.5°, they form a 
121 x 121 grid, EQ - earthquake(s). 





Time 
Interval 


Pacific Regions 


NW 


SW 


Annual Rate 


Vl 


77-03 Forecast 


35.7159 


60.7509 




77-03 EQs 


968 


1591 


v 2 


77-03 EQs 


35.8546 


58.9304 


n 2 


04-06 EQs 


108 


170 


V3 


04-06 EQs 


35.9918 


56.6537 



Table 2: Annual earthquake rate estimates. Actual rate calculations are 
made with time intervals measured in days (9861 days in 1977-2003 and 
1096 days in 2004-2006). For display convenience, we convert the daily rates 
into annual rates by multiplying them by 365.25. EQs - earthquakes. 
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Table 3: Information scores for one event in west Pacific regions, the standard 
error (a) and coefficients of skewness (77) and kurtosis (ip) (I14H16I) . The 
numbers in parentheses are event counts in 2004-2006 for each region, n 2 . 
Variables a n , i] n , and ip n are for the set of n 2 events. 
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Figure 1: NW-Pacific long-term seismicity forecast: latitude limits from 
0.25°S to 60.25°N, longitude limits from 109.75°E to 170.25°E. The fore- 
cast is calculated at 121 x 121 grid. Colour tones show the probability of 
shallow (depth less or equal to 70 km) earthquake occurrence calculated us- 
ing the CMT 1977-2003 catalog; 108 earthquakes for 2004-2006 are shown in 
white. The uniform background probability density (e = 0.01, see Eq. [I]) can 
be observed at northwest and southeast corners of the map as greyish-green 
areas. 34 
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Figure 2: NW-Pacific long-term seismicity forecast. Colour tones show the 
probability of earthquake occurrence calculated using the CMT 1977-2003 
catalog; 1080 simulated earthquakes for 2004-2006 are shown in white. 
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Fig. 3 

0.45 1 1 ! ! ! 1 — 




Log Likelihood Ratio (o) 

Figure 3: Histograms of the log-likelihood function differences for 2004-2006 
simulated earthquakes (see Fig. [2]). The functions are normalized to have 
a unit standard deviation. We simulate 10,000 sets of 108 events for the 
NW-Pacific and of 170 events for the SW-Pacific. The blue line is the Gaus- 
sian curve with a zero mean and unit standard deviation. Red curve corre- 
sponds to simulation distributions for NW-Pacific; green curve to SW-Pacific. 
Curves on the right from the Gaussian curve correspond to simulations that 
are worse than a real earthquake distribution; curves on the left correspond 
to simulations that are better than a real earthquake distribution. 
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Figure 4: Error diagram (r, v) example. 
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Fig. 5 
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Figure 5: Error diagram (r, v) for NW-Pacific long-term seismicity forecast. 
The forecast is calculated at 121 x 121 grid. Solid black line - the strategy 
of random guess, red line - the ordered density for long-term forecast, blue 
line - earthquakes in 2004-2006. 
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Figure 6: Error diagram (r, v) for SW-Pacific long-term seismicity forecast: 
latitude limits from 0.25°N to 60.25°S, longitude limits from 109.75°E to 
169.75°W. The forecast is calculated at 121 x 161 grid. Solid black line - 
the strategy of random guess, red line - the ordered density for long-term 
forecast, blue line - earthquakes in 2004-2006. 
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Figure 7: Likelihood function distribution for west Pacific long-term seismic- 
ity forecasts: red - NW-Pacific, blue - SW-Pacific (see boundaries in Figs. [U 

ED. 



38 



Fig. 8 
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Figure 8: Error diagram (r, v) for NW-Pacific long-term seismicity forecast, 
approximated by two-segment distributions. The solid thick black straight 
line corresponds to a random guess, the thick red solid line is for the NW 
forecast. Thin two-segment solid lines are for the curves with the information 
score Jo = 2.3645 bits. The slope D\ for the right-hand first segment is 
D\ = —2 Iq . For the next first segments slopes are D\ x 1.1, D\ x 1.25, 
Di x 1.5, Dx x 2.0, L>i x 3.0, D x x 5.0, D x x 10, D x x 50, B x x 100, D x x 250, 
L>! x 1000, L>i x 10,000, L>! x 100,000, and D x x 1000,000. 
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Fig. 9 




Figure 9: Dependence on slope \D\\ of standard deviation (magenta), co- 
efficients of skewness (blue) and kurtosis (green) for two-segment curves in 
Fig. [HJ Horizontal lines are these variables for the red curve in the cited plot. 
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Fig. 10 
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Figure 10: Error diagram (r, v) for NW-Pacific long-term seismicity forecast. 
Solid black line - the strategy of random guess. The solid thick red diagonal 
line is a curve for the NW forecast. Blue line is earthquake distribution 
from the CMT catalog in 2004-2006 (forecast); magenta line corresponds 
to earthquake distribution from the CMT catalog in 1977-2003; cyan line is 
earthquake distribution from the PDE catalog in 1968-2006. Thin green lines 
are ten simulations, displayed in Fig. [21 the first realization is shown by a 
thick green line. 
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